Global 3-hourly wind-wave and swell data for wave climate and wave energy resource research from 1950 to 2100

Ocean wave climate, including wind waves and swells, is essential to human marine activities and global or regional climate systems, and is highly related to harnessing wave energy resources. In this study, a global 3-hourly instantaneous wave dataset was established with the third-generation wave model MASNUM-WAM and wind forcings derived from the products of the First Institute of Oceanography-Earth System Model version 2.0, the climate model coupled with wave model, under the unified framework of the Coupled Model Intercomparison Project phase 6. This dataset contains 17 wave parameters, including the information associated with wave energy and spectral shape geometries, from one historical (1950–2014) simulation and three future (2015–2100) scenario experiments (ssp125, ssp245, and ssp585). Moreover, all the parameters can be accessed separately in the form of wind waves and swells. The historical results show that the simulated wave characteristics agree well with satellite observations and the ERA5 reanalysis products. This dataset can provide the community with a unique and informative data source for wave climate and wave energy resource research.


Background & Summary
Ocean waves, arriving from specific wind events that are occurring locally (wind waves) or have occurred somewhere else on the sea surface (swells), can reach tens of meters in height or travel thousands of miles and can bring serious threats to various marine activities, such as sea voyages 1-3 , ocean fishing 4,5 , and oil exploitation [6][7][8] . Ocean waves can also be intimately involved in the energy and material exchange between the atmosphere and ocean, playing a crucial role in global and regional climate systems 9,10 . Therefore, understanding the wave climate and change is valuable for offshore engineering structure safety, shoreline protection, global warming prevention, etc. Moreover, harnessing wave energy (the most concentrated and high-available source of marine renewable energy with great potential for exploitation 11,12 ) needs to consider its annual or seasonal spatial distributions and temporal variability, which is highly correlated with the wave climatology of the target area.
In studies on wave climate, especially in those interested in future climatic scenarios, simulation with numerical wave models is the main method used. With specific wind forcings, the simulation can generate long-term wave parameters with continuous coverage in space and time, which can be applied in further analyses. A series of wave datasets are proposed to support such research, e.g., the earlier proposed ERA-Interim 13 provides the basic bulk wave parameters, such as significant wave height, mean wave period, and mean wave direction. The later presented ERA5 reanalysis [14][15][16] and EMC/NCEP wave hindcast 17 datasets can further exhibit those wave parameters above in the forms of wave spectral partitions, i.e., wind waves and swells, and the ERA5 dataset can also provide the parameters associated with spectral shape geometries, which can be adopted in extreme wave event analyses [18][19][20][21] . In addition to the reanalysis and hindcast datasets, wave data products from global and regional climate models can be helpful in understanding the response of the global wave climate to both historical and future climate change. For example, the First Institute of Oceanography-Earth System Model version 2.0 (FIO-ESM v2.0) 22

Methods
This dataset was established with the third-generation wave model MASNUM-WAM and wind forcings derived from the products of FIO-ESM v2.0 under the framework of CMIP6. In this section, we introduce the wave model and the wind forcings adopted in this study. Brief introductions to calculating wave parameters and identifying wind waves and swells are also presented. [29][30][31][32] is a third-generation wave model developed by the Key Laboratory of MArine Science and NUmerical Modeling (MASNUM), FIO of MNR (Ministry of Nature Resources) of China. MASNUM-WAM solves the energy spectrum balance equation in wavenumber space and uses a complicated characteristic inlaid scheme 29 in spherical coordinates 31 to perform shoaling and refraction effects in shallow waters, the modulation of background current to wave evolution, and the refraction of waves propagating along great circles.

MASNUM-WAM and modeling configuration. The MASNUM-WAM (formerly LAGFD-WAM)
In this work, the ST6 source function package 33-35 is adopted to simulate the effects of wind input, white-capping dissipation, and swell dissipation on the evolution of waves, and the DIA 36,37 scheme is adopted to calculate the nonlinear energy transfer between waves. A global computational grid is used in the simulation, covering the region from 80°S to 80°N and 0°(360°)E to 359°E with a 1°×1° horizontal resolution. The modeling spectral space is set as 24 directions with intervals of 15° and 35 wavenumbers spaced logarithmically from the minimum of 0.0071 up to 4.6341 with intervals of k i+1 /k i = 1.21, which are equivalent to frequencies from 0.042 Hz to 1.073 Hz with a ratio of 1.1 at infinite depth. Finally, bathymetric data are obtained from ETOPO1 38 of the National Oceanic and Atmospheric Administration (NOAA) National Geophysical Data Centre (NGDC).  [29][30][31][32], and a sea ice model (Los Alamos sea ice model version 4 42 ). FIO-ESM v2.0 has considered four distinctive physical processes, including nonbreaking surface wave-induced vertical mixing 43,44 , the effects of Stokes drift on momentum and heat fluxes, the effects of sea spray on heat flux, and the SST diurnal cycle.
Recently, FIO-ESM v2.0 was used to carry out CMIP6 experiments 23 . The wind data products are derived from the CMIP6 historical data, and then, the three future scenario experiments are adopted as driving forcings in this work. The historical simulation represents climate change over the 1850-2014 period, and the future scenario experiments, which belong to the CMIP6-Endorsed Scenario Model Intercomparison Project (ScenarioMIP) 24 , are the projections of future (during 2015-2100) climate change. In this work, the three future scenarios are forced by the latest proposed shared socioeconomic pathways (SSPs), denoted ssp126, ssp245, and ssp585, representing the low, medium, and high ends of the range of future forcing pathways to produce radiative forcings of 2.6 W m −2 , 4.5 W m −2 , and 8.5 W m −2 in 2100, respectively. All four experiments mentioned above were forced by the forcing datasets provided by CMIP6 (https://esgf-node.llnl.gov/search/input4mips/).
The derived wind forcing data are the zonal and meridional wind velocities at 10 m above the sea surface. The horizontal resolution is a finite volume grid (approximately 0.9° × 1.25°), and the time resolution is three hours.
Wave parameters. As a spectral model adopted in the simulation, wave parameters can be obtained by integrating the simulated 2-D wave spectra. The 2-D wavenumber-direction spectra simulated in MASNUM-WAM can be expressed as E(k, θ), which can be easily transformed to the frequency-direction energy spectra S(f, θ) as follows:  Parameters associated with spectral shape geometries can also be obtained directly from the simulated spectra, e.g., the peak frequency f p is defined as the frequency representing the maximum value of S(f); the peak wave direction d p is expressed as follows: The wavelength at the peak wavenumber k p is calculated as follows: p p and k p is associated with f p according to Eq. (3), and the peak wave period T p is defined as the reciprocal of f p and calculated using a parabolic fit around f p . In this work, additional geometries are also provided, such as the Goda peakedness 45 (where a and b are obtained according to Eq. (11)). Moreover, wave parameters, which can be applied to wave energy assessment and characterization, are also presented. According to the Technical Specification proposed by the International Electrotechnical Commission 48 (IEC TS 62600-101:2015; hereafter IEC2015), the (omni-directional) wave power density (WPD) can be estimated as follows: where ρ (taken as 1023 kg/m 3 in this work) denotes the density of seawater, g = 9.81 m/s 2 is the acceleration of gravity, and C g is the group velocity of waves in Eq. (2). IEC2015 also recommends directionally resolved WPD, i.e., resolving omni-directional WPD in a specific direction θ j : In this work, θ j are assigned to the 24 discrete directions in the spectral space, and the maximum value of ) and the corresponding direction θ j,max are retained for each simulated spectrum. Furthermore, to measure the relative spread of wave energy in the f and θ directions, IEC2015 recommends the following coefficients: max , j respectively.
Finally, the wind-sea fraction (WSF) parameter is introduced to characterize the proportion of wind wave energy contained in each spectrum and is presented as follows: 49,50 is the energy in the spectral space for which the projected wind speed U p is larger than the local wave phase velocity c. The parameter U p can be calculated as follows: where U 10 denotes the wind speed at the height of 10 m above the sea surface, δ denotes the angle between the direction in the spectral space and the direction in which the wind is blowing, and C mult is a coefficient set as 1.7 in this work. In the spectral space, wave phase velocity c is associated with frequency f and wavenumber k; thus, according to Eq. (3), Identification of wind waves and swells. The spectral partitioning technique was introduced to demonstrate the historical and future wave characteristics in the forms of wind waves and swells. The spectral partitioning technique can be traced back to a digital image processing watershed algorithm 51 , which can be adopted to identify watershed lines, mountain peaks, and valleys in topographic maps. Because the 2D spectrum resembles a topological surface, it is logical to apply such an algorithm in this circumstance 52 . As described by Hanson and Phillips 49 , the basic approach to the spectral partitioning method is that by searching through the spectral matrix S(f, θ), the paths of steepest ascent leading to each peak or local energy maximum can be identified; then, all paths (2023) 10:225 | https://doi.org/10.1038/s41597-023-02151-w www.nature.com/scientificdata www.nature.com/scientificdata/ leading to the same peak can be grouped, and the members that lie on the collection of the paths are considered to belong to a distinct partition. Partitioning of wave spectra is widely adopted in research concerning data assimilation 52-54 , spatial and temporal tracking of wave systems 49,55 , and so on.
Notably, the wave parameters mentioned above can be calculated not only from the entire spectrum but also from a partition of it. In each partitioned spectrum, partitions whose WSF are greater than 33.33%, together with spectral elements (f, θ) whose phase velocities (Eq. (24)) are less than the local projected wind speeds (Eq. (23)), are combined as a new partition, and the newly formed partition is considered to be under the direct influence of the wind and thus is identified as the wind wave system; the remaining partitions (WSF < 33.33%), including those incomplete partitions that have contributed some elements to the wind wave system, are then identified as individual swells. The swells in the same spectrum can be combined as a total swell partition.
The partitioning and identification program implemented in this work was developed based on the W3PARTMD module of WaveWatch III ver. 6.07 56 , in which an efficient FORTRAN routine was transformed from the MATLAB code 57,58 that was used to apply the watershed algorithm 51 . The wave parameters provided in this dataset are calculated from both the entire spectra and its wind wave and swell partitions; see the Data Records section for more details.

Data Records
This dataset consists of up to 17 kinds of wave parameters, and wind speeds at 10 meters above the sea surface (toward the east and north) are also provided. The dataset covers the area of 0°E-359°E, 80° S-80° N with spatial intervals of 1°, and the temporal intervals are 3 hours. The 17 wave parameters presented in this dataset are integrated from both the entire spectra (i.e., combined wind waves and swells, hereafter COMB) and their partitions, and the partitions are presented in the forms of wind waves (WSEA), total swells (TSWL), and the first three swells with the largest H s (SWL1, SWL2, and SWL3). Moreover, the wave dataset spans a 65-year historical period (1950-2014) and three 86-year future scenarios (2015-2100).
The data mentioned above are stored monthly for each wave and wind parameter, and the filenames of the data are in the following format: <para_id>_<exp_id>_<yyyymm>.nc, where <para_id> denotes the name of parameters, see Table 1; <exp_id> represents the name of the CMIP6 driving conditions, which are 'histor' , 'ssp126' , 'ssp245' , and 'ssp585'; and <yyyymm> are expressed as 195001-201412 for the 'histor' data and as 201501-210012 for the three future scenarios. Since there are 780 and 1032 months during 1950-2014 and 2015-2100, respectively, and there are 17 wave parameters and 2 wind parameters to be exhibited, the number of files in the historical catalog is 780 × 19 = 14820, and the number of files for each future scenario is 1032 × 19 = 19608.
All data files are provided in NetCDF format and are archived in the ScienceDB 59 . The variables in each file, together with their descriptions, dimensions, and units, are outlined in Table 1. The dimension of time is presented as days since 1950-01-01 00:00:00; the dimensions of latitude and longitude are expressed as degrees north and east, respectively; and variables associated with waves contain the 'npt' dimension, and npt from 1 to 6 indicate variables derived from the COMB, WSEA, TSWL, and SWL1-3, respectively. In this dataset, '_FillValue' denotes land points; in particular, wave-associated variables may present a negative value, indicating that the spectrum or the spectral partition from which the parameter is obtained contains much less energy, such that the corresponding H s is less than 0.05 m.

Technical Validation
The MASNUM-WAM has been calibrated and adopted many times in previous scientific and engineering studies (e.g. [60][61][62][63][64][65] ); moreover, MASNUM-WAM is now the ocean wave component of several operational ocean forecasting systems (OFS), such as the OFS for the seas of China and adjacent areas 66 , OFS for Southeast Asian Seas and OFS for the 21st-Century Maritime Silk Road 67 . Therefore, validation of the MASNUM-WAM is not shown in this study.
The validation of FIO-ESM v2.0 can be referred to in the work of Bao et al. 22 , in which FIO-ESM v2.0 was applied to conduct the CMIP6 DECK (Diagnostic, Evaluation and Characterization of Klima) and historical (1850-2014) experiments 23 . The results show that the time evolutions of surface air temperature, sea surface temperature, and Atlantic meridional overturning circulation in the past centuries are well reproduced; in particular, the common, large, warm sea surface temperature bias for all climate models is dramatically reduced, and the simulated El Niño-Southern Oscillation period is much closer to the observation within 2-7 years. Therefore, it is suggested that the performance of FIO-ESM v2.0 under the CMIP6 experimental framework is stable and reliable, including in both the historical and future scenarios. Moreover, Song et al. 25 performed the www.nature.com/scientificdata www.nature.com/scientificdata/ validation of the FIO-ESM v2.0 wave product in the CMIP6 historical experiment. In the comparison against the ERA5 reanalysis data from 1979-2014, the monthly mean H s , θ m , T p , and T m02 show good agreement in terms of the basic characteristics of spatial pattern and seasonal variation, as well as the 99th-percentile values of H s derived from the 3-hourly data.
As the aim of this dataset is to aid in wave climate and wave energy resource research, and as most of the research characteristics of interest are closely related to the parameters of H s and T e , the validation focuses on the climatology of the two key parameters, denoted as Hs-MAS and Te-MAS, in this dataset. In addition, the quality of wind forcing adopted to force MASNUM-WAM is also to be validated, and we focus on the wind speed (WS), denoted as WS-MAS, in this section.
The AVISO gridded wind and wave products 68 are selected as the observation baseline to be compared with WS-MAS and Hs-MAS. The AVISO products comprise the daily WS and H s observations, denoted as WS-OBS and Hs-OBS, respectively, merged from a set of missions, such as Envisat, Jason-1-3, AltiKa, and Sentinel-3A, and it covers 0°E to 359°E, 90°S to 89°N with a horizontal resolution of 1° × 1°. Notably, to be comparable with the MASNUM-WAM simulated results in the historical scenario, the sampling periods are 2013-2014 and 2010-2014 for WS-OBS and Hs-OBS, respectively. To validate WS-MAS with a longer sampling period and to validate Hs-MAS and Te-MAS in the forms of wind waves and swells separately, the 'ERA5 hourly data on single levels from 1959 to present' dataset 69 is adopted. The ERA5 hourly data can provide 10m u-v winds and separate COMB, WSEA, and TSWL wave characteristics, covering 0°E-360°E, 90°S-90°N with spatial intervals of 0.25° and 0.5° for the wind and wave parameters, respectively; to be comparable with Hs-MAS and Te-MAS, the original spatial and temporal resolutions are reduced and the sampling period is selected as 2010-2014. Then, the employed ERA5 WS, H s and T e products are denoted as WS-ERA, Hs-ERA and Te-ERA, respectively. Finally, to perform a more general validation, a global ensemble of ocean wave climate statistics 28 is introduced. As a product of the COWCLIP2 10,28 , the statistics mentioned above comprise 14 contemporary wave reanalysis and hindcasts computed across 1980-2014, including general and extreme statistics of H s , mean wave period (such as T m01 and T m02 ), and θ m at different frequency resolutions (monthly, seasonally, and annually). We employed the mean values of annually averaged H s , denoted as Hs-COW, in the statistics as the baseline to validate the historical Hs-MAS. Similarly, the spatial resolution of Hs-COW has been adjusted to that of Hs-MAS. Furthermore, in the comparisons below, four quantitative errors are also exhibited: the Pierson's correlation coefficient R

Comparisons of WS-MAS against WS-OBS and WS-ERA. Comparisons of WS-MAS against WS-OBS
and WS-ERA can validate the quality of the forcing wind. Figure 1 illustrates the climatological distributions of WS-MAS and WS-OBS in boreal winter (December-January-February, DJF, panels a,b), spring (March-April-May, MAM, panels c,d), summer (June-July-August, JJA, panels e-f), and autumn (September-October-November, SON, panels g-h) during 2013-2014, as well as the annual (ANN, panels i-j) mean result. The quantitative errors between the seasonal-or annual-averaged WS-MAS and WS-OBS are also exhibited in the same rows of the corresponding panels. Figure 1 shows that WS-WAM and WS-OBS can achieve a strong level of agreement around the world. The distribution patterns of the two characteristics are quite similar for both seasonal and annual mean statistics. The mean value of quantitative error B can even be 0.00 m/s when all the samples are involved.
The climatological distributions of averaged WS-MAS and WS-ERA (panels a-b), together with the 95-th percentile of the two characters (panels c-d), are illustrated in Fig. 2, with the corresponding quantitative errors exhibited in panels b and d, respectively. The sampling period is from 2010 to 2014. As shown in Fig. 2, the distribution patterns of WS-MAS and WS-ERA can match strongly in both mean and extreme conditions, although understandably, the quantitative errors in extreme conditions are slightly higher than those in mean conditions. Therefore, we can conclude that the quality of the forcing wind derived from the FIO-ESM v2.0 product is robust and reliable.

Comparison between Hs-MAS and Hs-OBS. A comparison between Hs-MAS and Hs-OBS can be used
to assess the seasonal and annual mean state of the simulated H s in spatial distribution patterns. The climatological distributions of averaged Hs-MAS and Hs-OBS in boreal winter (December-January-February, DJF, panels a,b), spring (panels c,d), summer (panels e,f), and autumn (panels g,h) during 2010-2014, together with the annual (panels i,j) mean result, are illustrated in Fig. 3. The quantitative errors between the seasonal-or annual-averaged Hs-MAS and Hs-OBS are also exhibited in the same rows of the corresponding panels. Figure 3 shows that the comparison against the satellite-observed H s has a good agreement. The seasonaland annual-averaged Hs-MAS and Hs-OBS distribution patterns are very similar. The differences are generally

Comparisons of Hs-MAS and Te-MAS against Hs-ERA and Te-ERA. Comparisons of Hs-MAS and
Te-MAS against Hs-ERA and Te-ERA demonstrate the performance of the simulated H s and T e in the forms of wind waves and swells separately. Figure 4 illustrates the climatological distributions of annually averaged Hs-MAS and Hs-ERA in the forms of COMB (panels a,b), WSEA (panels c,d), and TSWL (panels e,f), with the quantitative errors exhibited in the same rows of the corresponding panels. In addition to the mean state, extreme conditions, i.e., the 95th-percentile values of Hs-MAS and Hs-OBS derived from 2010-2014, are presented in Fig. 5, where the panels and quantitative errors are arranged similarly to those in Fig. 4. Figure 4 shows that the mean states of COMB Hs-MAS coincide well with those of COMB Hs-ERA, where the four errors are close to those exhibited in the comparison between the annual mean Hs-MAS and Hs-OBS. The consistency of the two datasets in WSEA is even better, but a larger difference can be found when considering TSWL conditions; nevertheless, quantitative errors between TSWL Hs-MAS and TSWL Hs-ERA still suggest acceptable goodness of fit. For extreme waves shown in Fig. 5, the comparisons also exhibit good agreements; the R coefficients continue at the high levels that have been found in Fig. 4, and the values of B, MAE and RMSE in Fig. 5 become larger due to higher wave heights involved in the statistical procedure. Differences in the spatial distribution of both Figs. 4, 5 are similar to Fig. 3. Hs-MAS is larger than Hs-ERA in the seas south of 60°S for both WSEA and TSWL conditions, and TSWL Hs-MAS is even larger than TSWL Hs-ERA by www.nature.com/scientificdata www.nature.com/scientificdata/ approximately 0.6-0.8 m in both mean and extreme conditions. In the Arctic Ocean, Hs-MAS is smaller than Hs-ERA, mainly due to the smaller estimated TSWL of Hs-MAS. In the North Atlantic, Hs-MAS is generally smaller than Hs-ERA for the mean state, but for extreme conditions, the former is larger than the latter in the eastern part of the ocean.
The comparisons between Te-MAS and Te-ERA for the mean and extreme conditions are shown in Figs. 6, 7, respectively. Panels for the wave patterns of COMB, WSEA, and TSWL are illustrated in the same way as Figs. 4, 5, as well as the quantitative errors. Parameters associated with wave periods may be influenced markedly by spectral shapes; thus, the simulated T e in the two datasets might not easily achieve consistency, especially when spectral partitioning is conducted. Figure 6 shows that Te-MAS is larger than Te-ERA in almost all oceans around the world, including both WSEA and TSWL patterns, but the extreme bias is no more than 1.8 s. TSWL Te-MAS is smaller than TSWL Te-ERA by approximately 0.2-0.4 s in the North Atlantic, resulting in a smaller COMB Te-MAS in the same location. For the extreme conditions shown in Fig. 7, the differences between Te-MAS and Te-ERA are reduced for both WSEA and TSWL conditions, although the Te-MAS is still larger than the Te-ERA by approximately 1 s; it is noted that Te-MAS is estimated to be smaller than Te-ERA in the North Atlantic for both WSEA and TSWL wave patterns. Figure 8 illustrates the climatological distributions of the averaged Hs-MAS (left-column) and Hs-COW (right-column) for the annual mean (panels a-b) and 95-th percentile (panels c-d) statistics. The corresponding quantitative errors are exhibited in panels b and d, respectively.

Comparisons between Hs-MAS and Hs-COW.
Good agreement between the statistics of Hs-MAS and Hs-COW can still be found in Fig. 8. The deviations in spatial distribution for both mean and extreme conditions are very similar to the corresponding panels in Figs. 4, 5, as well as the quantitative errors. In addition, Hs-MAS is also smaller than Hs-COW in the North Atlantic by approximately 0.4 m for the annual mean values and by approximately 1.0 m when the 95-th percentiles are considered.
Overall, the above analyses indicate that the mean state of WS, H s , and T e proposed in the newly established dataset can capture the basic characteristics of the satellite observations in seasonal and annual spatial distributions and can also be broadly consistent with the ERA5 products in both forms of WSEA and TSWL and both mean and extreme wave conditions. The comparisons against the more general wave statistics produced through COWCLIP2 confirm the conclusions mentioned above. However, the simulated H s and T e may still suffer biases, especially in the southern 60°S, Arctic, and North Atlantic Oceans.    (panels a,b) and 95-th percentile (panels c,d) statistics, with the corresponding quantitative errors exhibited in panels b and d, respectively. The sampling period is from 1980 to 2014.